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Abstract 

Real-time heuristic search algorithms are suitable for situated agents that need to make their decisions 
in constant time. Since the original work by Korf nearly two decades ago, numerous extensions have 
been suggested. One of the most intriguing extensions is the idea of backtracking wherein the agent 
decides to return to a previously visited state as opposed to moving forward greedily. This idea has 
been empirically shown to have a significant impact on various performance measures. The studies 
have been carried out in particular empirical testbeds with specific real-time search algorithms that use 
backtracking. Consequently, the extent to which the trends observed are characteristic of backtracking in 
general is unclear. In this paper, we present the first entirely theoretical study of backtracking in real-time 
heuristic search. In particular, we present upper bounds on the solution cost exponential and linear in a 
parameter regulating the amount of backtracking. The results hold for a wide class of real-time heuristic 
search algorithms that includes many existing algorithms as a small subclass. 

Keywords: real-time heuristic search, agent-centered search. 

1 Introduction 

In this paper we study the problem of agent-centered real-time heuristic search (Koenig, 2001). The dis- 
tinctive property of such search is that an agent must repeatedly plan and execute actions within a constant 
time interval that is independent of the size of the problem being solved. This restriction severely limits the 
range of applicable algorithms. For instance, static search algorithms (e.g., A* of Hart, Nilsson, & Raphael, 
1968), re-planning algorithms (e.g., D* of Stenz, 1995), anytime algorithms (e.g., ARA* of Likhachev, 
Gordon, & Thrun, 2004) and anytime re-planning algorithms (e.g., AD* of Likhachev, Ferguson, Gordon, 
Stentz, & Thrun, 2005) cannot guarantee a constant bound on planning time per action. LRTA* provides 
such guarantees by planning only a few actions at a time and updating its heuristic function, but the solution 
quality can be poor (Korf, 1990; Ishida, 1992). 

As a motivating application, consider navigation in gridworld maps in commercial computer games. In 
such games, an agent can be tasked to go to any location on the map from its current location. The agent 
must react quickly to the user's command regardless of the map's size and complexity. Consequently, game 
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companies impose a time-per-action limit on their pathfinding algorithms. As an example, Bioware Corp., a 
major game company, limits planning time to 1-3 ms for all pathfinding units (and there can be many units 
planning simultaneously). 

The original real-time search algorithms, RTA* and LRTA*, form a local search space (LSS) around the 
agent's current state. Then they greedily take an action toward the most promising state on the frontier of 
the LSS. A large number of subsequent real-time heuristic search algorithms have followed this canon (e.g., 
Russell & Wefald, 1991; Furcy & Koenig, 2000; Shimbo & Ishida, 2003; Koenig, 2004; Hernandez & 
Meseguer, 2005a, 2005b; Koenig & Likhachev, 2006; Rayner, Davison, Bulitko, Anderson, & Lu, 2007; 
Bulitko, Sturtevant, Lu, & Yau, 2007; Bulitko, Lustrek, Schaeffer, Bjornsson, & Sigmundarson, 2008). 
Arguably, the most radical departure was an introduction of the so-called backtracking moves by Shue and 
Zamani (1993a, 1993b), Shue, Li, and Zamani (2001). Their impact on performance of real-time heuristic 
search and, in particular the cost of the solution the agent finds, has been studied mostly empirically (Shue & 
Zamani, 1993a, 1993b; Shue et al., 2001; Bulitko, 2004; Bulitko & Lee, 2006; Sigmundarson & Bjornsson, 
2006). As a result, it is unclear to what extent the reported findings and trends are specific to the particular 
algorithms and/or to the testbed environments used. 

The contribution of this paper is an entirely theoretical investigation of effects of backtracking on real- 
time search performance. We describe a theoretical framework that generalizes a broad class of existing real- 
time search algorithms. We show that in the worst case, solution cost can be exponential in the parameter 
controlling the amount of backtracking. We then identify a special case that affords linear solution cost. 
Because we consider real-time heuristic search on general graphs, the results of our study are domain- 
independent and, thus, broadly applicable. 

The rest of the paper is organized as follows. We first informally review the pioneering LRTA* algo- 
rithm and introduce the notion of backtracking in Section 2. The search problem and performance metrics 
are formally defined in Section 3. Section 4 introduces our framework of real-time search. We then use the 
framework to derive properties responsible for exponential (Section 5.1) and linear (Section 5.2) solution 
cost. We review existing theoretical work in Section 6. The paper is concluded with a discussion of limita- 
tions and future work directions. Note that there are no proofs in this version of the paper. We are working 
on polishing their presentation for a future version. 

2 Backtracking in Real-time Heuristic Search 

To begin, we present the original real-time search called Learning Real-Time A* or LRTA* (Korf, 1990) 
that constitutes the core of most modern real-time search algorithms. In the current state s, LRTA* with a 
lookahead of one considers the immediate neighbors (lines 4-5 in Figure 2.1). For each neighbor state, two 
values are computed: the distance of getting there from the current state (henceforth denoted by g) and the 
heuristic estimate h of the distance to the closest goal state from the neighbor state. LRTA* then travels to 
the state with the lowest / = g + h value (line 7). Then it updates the heuristic value of the current state if 
the minimum /-value is higher (line 6). The process repeats until a goal state is reached. 

This paper analyzes the role of backtracking moves which were introduced in an algorithm called Search 
and Learning A*, or SLA* (Shue & Zamani, 1993a, 1993b). The SLA* is based on the LRTA* algorithm 
with a lookahead of one we described above with one notable difference. Namely, whenever the heuristic 
value of the current state is updated (line 7 in Figure 2.2), the algorithm returns to the previous state (line 8). 
Otherwise (i.e., when there is no learning), SLA* proceeds to the most promising successor state using the 
same rule as LRTA* (line 10). Naturally, all backtracking moves incur the travel cost as do regular (forward) 
moves. 
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LRTA* 

1 initialize the heuristic: h <— ho 

2 reset the current state: s <— s start 

3 while s S g do 

4 generate successor states of state s 

5 among them find the state s' with the lowest / = g + h 

6 if f(s')>h(s) 

1 update h{s) to f(s') 

8 end if 

9 execute an action to get to s' 

10 end while 

Figure 2. 1 : The LRTA* algorithm with a lookahead of one. 



SLA* 

1 initialize the heuristic: h <— ho 

2 reset the current state: s <— s start 

3 while s 5 9 do 

4 generate successor states of state s 

5 among them find the state s' with the lowest f = g + h 

6 if f(s')>h(s) 

I update h(s) to /(s') 

8 execute an action to return to the previous state 

9 else 

10 execute an action to get to s' 

II end if 
12 end while 

Figure 2.2: The SLA* algorithm. 

The backtracking mechanism provides the agent with two opportunities: (i) to update the heuristic 
value of the previous state and (ii) possibly select a different action in the previous state. An alternative 
scheme would be to update heuristic values of previously visited states in memory (i.e., without changing 
the agent's current state and incurring action cost). This is the approach used by some other real-time search 
algorithms (Hernandez & Meseguer, 2005b; Sigmundarson & Bjornsson, 2006) and it does not give the 
agent the opportunity to select a different action. 

3 Definitions and Notation 

In this section we axiomatically introduce the problem and the formal notation used throughout the rest of 
the paper. 

Definition 1 A search space is defined by a connected directed weighted graph G = (S,E,w), and a 
subset of absorbing (goal) states S g C G. Elements of S are states of the search problem and the vertices 
in the graphs. E is the set of directed edges: E C S x S defining the transitions in the state space. Each 
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edge/transition originating in the state s corresponds to an action the search agent can take in s. The function 
w : E — > {en | n G N, n > 0} specifies the edge weights (i.e., action costs). Here e G R + is a positive 
constant. Henceforth, R + is a set of positive real numbers and M® = IR + U {0}. 

Note that unlike the standard definition (e.g., Bulitko & Lee, 2006), in this paper we consider a more general 
case of a possibly infinite search space and multiple goal states. 

Definition 2 Distance dist(si, S2), si, S2 £ S is defined as the minimum cumulative weight of a path 
originating in s\ and ending in s^. Such a path is called the shortest path. We generalize this definition 
for arbitrary sets of states S' S" C S as follows: dist(S", S") = min dist(s', s"). Thus, Vsi, s 2 G 

S 1 , dist(si,S2) = dist({si}, {52}). Distance to goal of the state s is defined as /i*(s) = dist(s,5 fl ) = 
min dist(s, s g ). 

Sg&Sg 

Definition 3 Edge-distance \\s, s'\\ is the minimum number of edges among all shortest paths between the 

states s and s'. For any S', S" C S, we define IIS", S"\\ = min lis', s"\\. 

s'es',s"&s" 

Definition 4 A heuristic search problem is defined by the search space G(S, E, w), the goal subset S g , an 
initial heuristic hi n n, and an initial state so- The initial heuristic function /i; n i t (e.g., Manhattan distance) is a 
mapping from S to {en \ n G N, n > 0}. The heuristic foinit(s) is an estimate of h*(s) and is available to 
the agent. A heuristic is called 6-admissible if Ms G S [h(s) < 6h*(s)]; 6 is a positive constant. A heuristic 
is called consistent if for any two states a and b, \h{o) — h(b) \ does not exceed dist(a, b). 

Definition 5 A heuristic search agent operates as follows. It starts in the initial state sq and traverses the 
graph by taking the actions (i.e., directed edges of the graph) until it enters a goal state. 

In this paper we do not consider resetting the agent back to its start state upon reaching a goal state and 
having it solve the problem again. In other words, we are concerned with the first solution only and do not 
consider the learning process over multiple trials (known as convergence). 

Axiom 1 The search problem satisfies the following conditions for some 8 G M., 9 > 0: 

Vs G S 3s' G S g [distfs, a') < 00] ; (3.1) 
Vs G S [\{s'\(s,s') G E}\ < 00] ; (3.2) 
Vs G S [h iDit (s) < 8 dist(s, S g )} . (3.3) 

Condition 3.1 postulates that a goal state is reachable from every state. Next, condition 3.2 states that the 
number of actions available in any state is finite (i.e., each vertex of the search graph has a finite degree). 
Finally, condition 3.3 stipulates that the initial heuristic h- m n is ^-admissible. These conditions are needed to 
ensure completeness (defined below) of the algorithms covered by our framework in the next section. 

Definition 6 (Completeness). A search algorithm is complete for a search problem if and only if it neces- 
sarily reaches a goal state after a finite number of edge traversals. 

4 A Framework of Learning Real-time Heuristic Search 

This section presents a framework of real-time heuristic search which we subsequently use in our analysis. 
We introduce the framework axiomatically and then illustrate it with examples. 
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4.1 Search Framework 

Definition 7 We call S' C S a separating set for the state s if every shortest path from s to every goal state 
s g G Sg, if it exists, passes through S'. 

From Condition 3.1 of Axiom 1, it follows that any non-goal state has a non-empty separating set (we can 
define S' as a set of states from all shortest paths between s and all goal states reachable from s). 

Definition 8 Suppose S' C S and seS, then D(s, S') denotes the set of all subsets of S' which happen to 
be separating sets for s. 

Clearly, for any non-goal state s, D(s, S) contains at least one element. 

Definition 9 We define the border of any set T C S as dT = {s 6 T \ 3s' G S \ T such that the edge 
(s, s') G E}. States in T \ dT are called inner states of T. 

Clearly, the border of any set T that does not contain inner goal states, is a separating set for an inner state 
s: S g fl T C dT Vs [s G T \ dT 9f G 3(s, T)]. An example is found in Figure 4.1. 
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Figure 4.1: A gridworld illustration of Definition 9. The states in S are 30 grid cells in the 6 x 5 grid world. 
The two goal states are marked as g± and #2- States in T are marked with circles. Light circles represent the 
inner states of T. Shaded circles form the border dT. 

We will now define a family of algorithms covered by our analysis. 

Definition 10 The search framework in Figure 4.2 implicitly defines a class of search algorithms. Any 
member of this class will be referred to as search algorithm tt(9, T) and is invoked by a search agent at 
discrete time steps. 

The fundamental part of the family of algorithms is the concept of stack. The stack is used to represent 
the path from the start state to the agent's current state. By analyzing stack's evolution, we will be able to 
formulate bounds on algorithm's performance. We define the stack notation below and then walk through 
the framework line by line. 

Definition 11 The stack is a first-in-last-out data structure that maintains the path found by the agent from its 
start state to its current state. The notation at = [s\ . . . s n ] means that the stack a contains states s±, . . . , s n 
at time t; state s\ is the start state, state s n is the current state. The top of the stack a = \s\ . . . s n ] is s n 
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Search algorithm n(6, T) 

1 reset the cycle counter: t <— 

2 initialize the heuristic: ho <— /ii n i t 

3 reset the stack: do <— [so] 

4 reset the learning amount: uq <— 

5 while top (cr t ) do 

6 generate a local search space T(at) 

7 compute the heuristic weight 7(0$) and the new heuristic /i t+ i 

8 update the learning amount ut+i 

9 take the action and compute the new stack cr t+ i 

10 advance the cycle counter: t <— t + 1 

1 1 end while 



Figure 4.2: The real-time search framework analyzed in this paper. 

and is denoted by top(a). Furthermore, if a and a' are stacks then their concatenation aa' is a stack. For 
brevity, we will use this notation for a concatenation of a stack with a single state (e.g., as). If a = s± . . . s n 
then a\ k = s±. . .Sk and a\k = Sn-fc+i ■ ■ ■ s n (i-e., the first and the last k elements of an n-element stack 
respectively). Notation a\ b a , a < b represents stack elements a a , . . . , at,. Finally, \a\ stands for the number 
of elements in the stack a. 

During the initialization (lines 1 - 4 in Figure 4.2), the cycle counter is reset to and the start state so is 
pushed onto the empty stack oo- The heuristic function at time (denoted by ho) is set to the initial heuristic 
hinif Finally, a real-valued counter, called learning amount uq, is cleared. On each cycle (lines 6 - 10), 
the algorithm goes through planning, learning, and execution phases. In the planning phase (line 6), the 
algorithm examines the states around the current state. The set r(cr t ) of the neighboring states examined is 
henceforth called local search space. r(cr t ) can be defined in many ways. The results in this paper apply to 
any definition of T(a t ) that satisfies the forthcoming Axiom 2. 

Learning happens in lines 7 and 8 where the algorithm computes the heuristic weight 7(0-4), the new 
heuristic function ht+i and the new learning amount Ut+i. Again, in the interest of covering as many 
heuristic update rules as possible, we do not specify a particular rule. Any learning rule satisfying Axiom 2 
is covered by our analysis. 

In line 9, the agent executes its move by updating its stack from at to at+i. We say that the agent takes 
a forward move if \&t+i\ = \&t\ + 1 (i- e -> the stack grows) and a backward move if \a t +i\ = \at\ — 1 (i.e., 
the stack shrinks). 

Note that real-time search algorithms can execute several actions per a single planning cycle. In our 
framework, the stack at stores only the states in which planning was carried out. As an illustration, consider 
the gridworld example in Figure 4.3. Suppose a path-finding agent starts out in a state with the coordinates 
(0, 0) and plans three moves - two right moves followed by a down move. This means that by the time 
it plans again, it will have visited the states (1, 0), (2, 0), (2, 1). In this paper, we discard the intermediate 
states (1, 0) and (2, 0) and update our stack from ao = [(0, 0)] to a\ = [(0, 0), (2, 1)]. This simplification is 
strictly for notational convenience and the discarded states are indeed visited by the agent. 

We increment the cycle counter in line 10. The loop is terminated as soon as a goal state appears on top 
of the stack (line 5). 
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Figure 4.3: Left: the initial state and the corresponding stack oo- Right: the next state after three moves are 
taken and the resulting stack o\ . 

4.2 Conditions 

The framework in Figure 4.2 defines a broad class of search algorithms. In order to prove completeness and 
upper bounds on solution, we introduce the following restrictions. 

Axiom 2 Suppose the learning quota T G M® U {oo}, and the admissibility weight 6 G M + are control 
parameters. Then the following conditions are imposed on the search framework in Figure 4.2 for any step 
t: 



top(a t )£S g => 2)(top(<7 t ),r(<7 t ))^0 

< j(<Jt) < 7 
|o"t+i I > \°~t\ => 3s G T(a t ) [at+i = o~ts & 
h t+1 (top(a t )) > 7(cr t ) dist(7(cr t ), s) + h t+ i(s)} 
< Wt\ => ^t+l(top(cr t )) > ft t (top(cr t )) 
VsGS [/i t (s) < 0/i*(s)] 

v s g r(<7 t ) > /it( s )] & v* r(<7 t ) [^+i( s ) = h t (s)} 

let V, = ( {S ° } ' l^+i|>N, 
[{s ,top(cr t )} , |cr t+ i| < |cr t |, 

then T > u t+ i and u t+ i =u t + ^ [ht+i(s) - h t (s)} 
The meaning of the conditions is as follows: 



(4.1) 
(4.2) 

(4.3) 
(4.4) 
(4.5) 
(4.6) 



(4.7) 



s^V t+ i 



Condition 4.1 requires that the lookahead search space T(at) contain a separating set for the current non- 
goal state top(<7t). 

Condition 4.2 places an upper bound on the dynamically selected heuristic weight 7(04). 

Condition 4.3 mandates that whenever the stack grows, the new stack is produced by pushing a single state 
s from the local search space T(a t ) on the previous stack. Additionally, the updated heuristic value 
of current state top(cit) must be lower-bounded by the distance from the current state top(crt) to the 
new state s weighted by 7(0^) plus the distance from s to goal, as estimated by h t +i(s). 

Condition 4.4 requires the heuristic value of the current state to strictly increase whenever backtracking 
(i.e., popping the stack) occurs. 
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Condition 4.5 requires the heuristic to be ^-admissible at all times. 

Condition 4.6 postulates that the heuristic function cannot change outside of the lookahead search space 
r(crt) and it can only increase inside the space. 

Condition 4.7 states that the sum of all increments in the heuristic function over all states except the exclu- 
sion set Vf does not exceed the learning quota T. It also requires that the learning amount increases 
from ut to ut+i by the cumulative increment in the heuristic function from time t to time t + 1. The 
exclusion set Vt always contains the start state so- If the move executed at time t is a backtracking 
move, it also contains the current state top(<7t). 

Conditions 4.3 and 4.7 imply that the agent is allowed to do no more than T learning (i.e., increases to its 
heuristic) while moving forward. Once the quota is exhausted, it will have to backtrack every time it learns 
and is allowed to move forward only when its heuristic is locally consistent (i.e., no learning is required). 
This is in line with existing algorithms SLA*T (Shue et al., 2001) and LRTS (Bulitko & Lee, 2006). 

Definition 12 We define the solution cost as the cumulative distances between the consecutive states on 
the agent's stack upon reaching a goal state. Specifically, suppose that upon reaching a goal state the stack 
ff„ = [sq, . . . , s m ]. Then, the solution cost is Y^JLi dist(sj_i, Sj). 

Note that solution cost is different from the number of moves on the first trial (Koenig & Simmons, 
1992, 1993, 1996) or the execution cost (Bulitko & Lee, 2006). Specifically, there are two differences: we 
do not count take the cost of backtracking moves into account and we use shortest-path costs between states 
on the stack. As noted above (Figure 4.3), only states in which the agent conducted planning are put on the 
stack. Our definition assumes that the path segment computed within a single planning session is optimal 
(i.e., the actual edge costs add to dist(s_j_i, Sj)). This is a realistic assumption for many algorithms because 
either they take a single move per planning session or their moves are guaranteed to be optimal within the 
local search space expanded at each planning session. 

We chose to use solution cost so defined as the performance measure because it provides an insight into 
the role of solution stack and backtracking in real-time heuristic search. As a consequence, we are able to 
derive upper bounds linear in the control parameter (T) as opposed to the tight bounds quadratic in the state 
space size (Koenig & Simmons, 1992, 1993, 1996). 

Definition 13 An instance it of the search framework in Figure 4.2 is said to be 9-admissible search if it 
satisfies Axiom 2 for a given value of 9. 

Note that Axiom 2 places no restrictions on the size of local search space T(a t ). In other words, non- 
real-time search algorithms that do not guarantee a move in a constant amount of time are covered by our 
analysis as well. 

4.3 Examples 

We will now illustrate the new notation and definitions with several examples. 

Example 1 (Korf 's LRTA*). Consider a four-state space in Figure 4.4. In our notation, the problem is 
specified as G = ({A,B,C,D},{(A,B),(B,C),(C,D),(D,C),(C,B),(B,A)},w) where the weight 
function w is 1 for all six edges. The goal set S g = {A} and the start state sq = C. The admissibility 
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goal start 




Figure 4.4: A four-state example. 

weight 9 = 1 and the learning quota T = oo. The initial heuristic function is defined as follows: hi n i t (A) = 
0, huat(B) = l,/iini t (C) = l,/ii n j t (D) = 0.7. Korf'sLRTA* with a lookahead of one (Figure 2.1) constitutes 
a 1 -admissible search and can be represented in our framework as follows: 

1. the neighborhood T(at) is defined as the immediate neighbors of a current state. Note that there is no 
left neighbor for state A and no right neighbor for state D; 

2. the heuristic weight j(at) is set to 1 for all a t (thus, 7 = 1); 

3. the stack always grows by one state — either the left or the right neighbor of the current state: | at+i | = 
|<7t| + 1 for all t. Namely, the agent goes to the immediate neighbor s with the lowest /-value: 
f(s) = 7(0*) dist(top(ot), s) + ht(s). The ties can be broken in any of the standard ways (e.g., 
randomly, systematically, or in a fixed order as done by Furcy & Koenig, 2000); 

4. the heuristic is updated only in the current state as follows: 

h t+1 (top(a t )) = max h t (top(a t )), min (7(0*) dist(top(<j t ), s) + h t (s)) } ; (4.8) 
[ ser(top(<T t )) J 

5. the learning amount is computed as follows: 

uo = 0, 

Vt > [u t+1 =ut + h t+1 (top(a t )) - h t (top(a t ))} . (4.9) 

The following table shows the heuristic values (with the value of the current state in bold), the contents of 
the stack a t , the local search space T(a t ), and the learning amount u t : 



t 


h t (A) 


h t (B) 


h(C) 


ht(D) 




r(a«) 


1H 








1 


1 


0.7 


[C] 


{B,D} 





1 





1 


1.7 


0.7 


[C,D] 


{C} 


0.7 


2 





1 


1.7 


2.7 


[C,D,C] 


{B,D} 


2.7 


3 





1 


2.0 


2.7 


[C,D,C,B] 


{A,C} 


3 


4 





1 


2.0 


2.7 


[C,D,C,B,A] 


{B} 


3 



The solution cost is 4 as it is the cost of the path on the final stack: C^D^C^B^A. 

It is straightforward to show that any parameterization of the LRTS algorithm (Bulifko & Lee, 2006) 
constitutes a ^-admissible search in our framework with set to the value of 7 in LRTS. As a corollary, 
LRTA* (Korf, 1990), weighted LRTA* (Shimbo & Ishida, 2003), SLA* (Shue & Zamani, 1993a, 1993b), 
and 7-Trap (Bulitko, 2004) are covered by our framework as well. The inclusion is proper. The following 
example shows that there are ^-admissible policies that are not instances of LRTS. 
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Example 2 (Beyond LRTS). Consider the 4-state search space and the initial heuristic from Example 1. 
The start state C is called 1-trap (Bulitko, 2004) with respect to the immediate neighbors since the /- 
value of the left neighbor is f(B) = dist(C, B) + h^^B) = 2, the /-value of the right neighbor is 
f(D) = dist(C, D) + hinit(D) = 1.7 and they both exceed the heuristic value of the start state: h- m n{C) = 1. 
In such cases, the agent may want to expand the search space until the current state is no longer a trap or an 
upper-bound d max is reached. This can be implemented in the framework: 

(4-10) 
(4.11) 
(4.12) 





= S(s c ,d) 


Sc 


= top(cr t ) 


S(s c ,k) 


= {s£5 |s c , s | = k} 


d 


= minlcZmax, ||s c , S g ||,min 


m 


= 7(o- t ) dist(s c , s) + ht(s). 



h t {s c ) > min f(s) (4.13) 
ses(s c ,k) J J 

(4.14) 

Intuitively, S(s c , d) defines a depth d full-width neighborhood which is guaranteed to be a separating set 
unless s c G S g . Additionally, we update the heuristic over the entire lookahead space: 



Vs G T(a t ) 



ht+i(s) = max < ht(s), max min f(s') > 
. v ' \ v 7 '5'eS)( s ,r( CTt )) s'eS' JK ' J 



(4.15) 



where /(•) is defined in Equation 4.14. Note that all updates from ht to ht+i are done in parallel (called 
synchronous backups by Barto, Bradfke, & Singh, 1995). The fact that the new value of each s G T(at) is a 
minimum /-value over a separating set S' G 2?(s, •) maintains 7-admissibility of the heuristic. The distance 
weight 7(04) is set to 1 for all t. We finalize the algorithm by setting T to 00 thereby disabling backtracking. 
The resulting algorithm is 1-admissible search in the sense of Axiom 2. On the other hand, the algorithm 
goes beyond the LRTS framework due to dynamic search space growth (line 4.13) and multiple heuristic 
updates (line 4.15). 

In Example 2 above, we used the 7-Trap 's and LRTS' "max of mins" update rule (Equation 4.15) which 
maintains 7-admissibility of the heuristic. The intuition lies with the fact that the minima are sought over 
separating sets. This means that each minimum is computed over a set of states that an optimal path to every 
goal state through. Clearly, it is safe to increase the heuristic value of the current node to the /-value of 
any state on an optimal path as it will not violate (weighted) admissibility. Thus, it is safe to increase h 
of the current state to the minimum / value of any separating set. As the initial heuristic is 6*-admissible, 
it makes sense to increase it as aggressively as possible (hence the max in the update rule). Note that the 
classic LRTA* seeks the minimum of /-values over depth d frontier which is a separating set. LRTS and 
7-Trap are more aggressive and look at a series of frontiers for the depth values of 1 , . . . , d. They then select 
the highest minimum as the new value of the heuristic value of the current state. 

Interestingly, the "max of mins" update rule is a sufficient but not necessary to preserve ^-admissibility 
of heuristic function. In Appendix A we show that even more aggressive ^-admissible rules are possible. 
More importantly, we derive a criterion for ^-admissibility and construct an non-trivial upper-bound on the 
magnitude of the updates for any 6*-admissible learning rule. 



5 Theoretical Analysis 

We start by proving that any ^-admissible search (Definition 13) is complete. 
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Theorem 5.1 (Completeness). When T < oo or \S\ < oo, any ^-admissible search it, 7 < 8 starting with 
a ^-admissible heuristic is complete. 

Corollary 5.1 When T < 00 or |5| < 00, any 7-admissible search it starting with a 7-admissible heuristic 
is complete. 

Consequently, in the rest of the paper, we will implicitly assume that 8 > 7 so that the search it is 8- 
admissible. This assumption can always be satisfied by increasing 8 to 7 if necessary. 

5.1 Exponential Solution Cost 

The amount of backtracking an agent performs is determined by inaccuracies of the initial heuristic as well 
as the learning quota control parameter (T). Is there a relation between T and solution cost produced by an 
algorithm? Bulitko and Lee (2006) showed that LRTS that prunes out duplicate states on its solution stack 
has a solution cost linear in T (cf. Section 6 for more discussion). Until now, it was not known whether this 
upper bound is specific to LRTS. In this section we show that any ^-admissible search has an upper bound 
on the solution cost exponential in T. We then show that the bound is tight by demonstrating an example in 
which the the bound is achieved. 

Theorem 5.2 (Exponential upper-bound). There exists a positive integer A such that for any ^-admissible 
search it and any start state sq the solution cost is upper-bounded by: 



Theorem 5.3 (Example of exponential cost). There exists a search problem G and a 1-admissible search 
7T, T = 12m, 7 = 1 where m is a positive integer such that the solution cost is 9 • 2 m+1 — 5 (i.e., exponential 
in T). 

5.2 Linear Solution Cost 

In this section we define a subclass of the exponential class whose members have solution cost linear in T. 
We then demonstrate that this class is non-empty by constructively defining two families of policies that 
belong to the linear subclass (Figure 5.1). 




Search with exponential solution cost 



Search with linear solution cost 




Figure 5.1: Structure of the set of ^-admissible search algorithms. 
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Definition 14 (Linear subclass). A search it belongs to the linear class if its solution cost has an upper 
bound linear in T. Specifically, a ^-admissible search it belongs to the linear subclass if there exist non- 
negative numbers a, b, c (that possibly depend on 9, 7) such that the solution cost is upper-bounded by 

adist(s ,S g ) + 6T + c. 

While traversing the state space, a ^-admissible search can visit a single state more than once. Such re- 
visits happen either through backtracking moves (the stack shrinks) or forward moves (the stack grows). In 
the latter case, several copies of the revisited state will be present on the search's path stack at once thereby 
forming a cycle in the solution. 

Definition 15 A ^-admissible search is called acyclic when its stack never contains multiple occurrences of 
a state. 

The LRTS algorithm of Bulitko and Lee (2006) is an example of acyclic search because it explicitly 
removes any cycles in its solution. 

Theorem 5.4 Any acyclic ^-admissible search belongs to the linear class. 

Another example of search with linear solution bound is achieved through limited backtracking as fol- 
lows. 

Definition 16 A ^-admissible search it is called piecewise backtracking if for a positive integer k the fol- 
lowing conditions hold: 

1. the path stack is divided into a finite number of segments. Each segment except possibly the last (i.e., 
top-most) segment are exactly fc-states each. We will denote stack segment i by <r|^ where is 
the first state and s ei is the last state of the stack segment. For all, except possibly the last segment, 

— h = k — 1. The first segment begins with the start state: b\ = 0; 

2. within each segment, every increase in the heuristic function (i.e., when ht(s) < ht+i(s)) results in a 
backtracking move as long as it does not bring the agent into the previous segment. In other words, if 
the current state s = top(ot) is the first state of a segment than no move is taken at all when ht(s) is 
increased (i.e., the agent stays put); 

3. every time the current stack segment (i.e., the segment containing the top of the stack) grows beyond 
k states, a new segment is started. Suppose its first state is top(cr) = Sf, M . Then the following quantity 



Once this quantity exceeds T, the segment that was just started is declared final. This means that it can 
grow beyond k states. As with other segments, each increase in a heuristic value forces the agent to 
take backtracking move within the segment. Backtracking is not allowed into the previous segments 
(i.e., past s bM ). 



is computed: 



M-l 




(5.1) 



i=i 
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Figure 5.2 illustrates the definition. It shows the situation precisely upon starting a new segment M. 
The stack runs horizontally, left to right. Heuristic values are shown vertically. The trapezoids have smooth 
upper edge indicating consistency of the heuristic function on each segment. The sharp discontinuities 
between heights of the neighboring trapezoids represent the discrepancies in the heuristic function of the 
neighboring segments. Together with the distances between the states they form the sum in Equation 5.1. 
We schematically show the last of the sum's terms (i.e., h(sb M ) — h(s eM _ 1 ) + dist(s eAf _ 1 , s& M )) with 
bidirectional arrows. 

Theorem 5.5 A piecewise backtracking search n has solution cost linearly upper-bounded by: 

30dist(s o ,S g ) + 2T 

for any positive integer k. 



6 Related Work on Solution Cost Analysis 

Ishida and Korf (1991) showed that LRTA* is guaranteed to reach the goal in 0(n 2 ) steps if the state space 
of size n has no identity actions. As Koenig and Simmons (1992, 1993, 1996) point out, this follows from 
the analysis of MTS if the target's position is fixed (Koenig, 1992). 

Koenig and Simmons (1992, 1993, 1996) extended the analysis onto a sub-class of reinforcement learn- 
ing algorithms of which LRTA* is a special case. They considered state- as well state-action value functions 
and two schemes of reinforcement: action penalty and goal rewards. For the action-penalty approach, they 
proved that the time-complexity of the first trial is upper bounded by 0(n 3 ) for Q-learning algorithms and 
by 0(n 2 ) for value-iteration algorithms. Both upper bounds are tight for zero-initialized heuristics. 

It can be shown that SLA* discussed in the previous section finds an optimal path by the time it arrives at 
the goal state. Furthermore, SLA* will have learnt perfect heuristic values for all states on such a path (Shue 
& Zamani, 1993a, 1993b). A subsequent algorithm, named SLA*T (Shue et al., 2001; Zamani & Shue, 
2001), introduced a learning quota parameter T which makes SLA*T behave identically to LRTA* (i.e., no 
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backtracking) as long as the overall amount of heuristic updates is under T. As soon as this threshold is 
exceeded, SLA*T starts behaving identically to SLA* (i.e., backtracks on every heuristic update). Bulitko 
and Lee (2006) showed that the length of solution found by SLA*T is upper-bounded by dist(so, S g ) + T 
where T is the learning quota/learning threshold parameter. This result holds only when the path being 
built by SLA*T is processed after every move and all state revisits are removed. The downside of this 
requirement is that the pruning operation substantially increases the running time of the algorithm and can 
require a non-constant amount of time per move. 

7 Future Work 

The analysis in this paper gives more insight into effects of backtracking in real-time heuristic search. Thus, 
we hope this will help designing high-performance real-time heuristic search algorithms that take advantage 
of backtracking. 

8 Conclusions 

Over the last two decades a number of extensions have been implemented to the original RTA*/LRTA* real- 
time heuristic search algorithm. One of the most radical extensions is backtracking which has been studied 
primarily empirically. Consequently, the reported trends were highly sensitive to the testbed insomuch 
as the effects observed in pathfinding on large maps were inconsistent to the effects observed on smaller 
maps (Bulitko & Lee, 2006). In this paper we presented the first entirely theoretical analysis of backtracking. 
In an attempt to make the results as general as possible, we imposed a minimum set of restrictions on the 
search algorithm. Yet, a tight non-trivial bound was derived on solution cost (exponential in the parameter 
controlling the amount of backtracking). We then imposed additional restrictions on the search algorithm 
and showed that they lead to linear solution costs. 
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A Criterion of ^-admissibility 



It is fairly easy to come up with an example of a policy that does not satisfy the "max of min" condition and, 
consequently, does not preserve the ^-admissibility of the heuristic. Remarkably, heuristic updates more 
aggressive than those dictated by the "max of min" rule that do preserve ^-admissibility are also possible. 
An example of each phenomenon is found in Figure A. 1 . 



- neigborhood 




- neigborhood 




Figure A. 1 : Left: a three-state search problem with the current state s and the search neighborhood including 
the middle state. The heuristic values are shown in the states as h t /h*. Each edge has the cost of 1. 
Increasing the heuristic in state s above the value of 2 dictated by the "max of min" rule will break 1- 
admissibility of the heuristic. Right: The same three-state search problem where the value of state s can 
be increased to 2, thus exceeding the 1 dictated by the "max of min" rule. Yet the 1 -admissibility will be 
preserved. 

It turns out that the "max of min" rule is a sufficient but not necessary condition for preserving 9- 
admissibility. However, the "max of min" rule can be strengthened into a necessary condition as follows. 
Specifically, with a slight modification to the "max of min" condition based on Definition 17, the criterion 
in Theorem A. 1 is proved. 



Definition 17 For any V C S, let us define the function h% as: 

/if (s) = max {h t (s), h t (s) — 0dist(s, s')} 



(A.l) 



where ht(s) is the heuristic value of state s at time t. The idea underlying h\ (s) is that if the heuristic ht(s) 
is 6>-admissible in all states at time t, then for any state s it can actually be increased at least to the value of 
its arbitrary neighbor s' minus the shortest distance from s to s'. 



Definition 18 We say that a search obeys a strengthened "max of min" condition if at any time t: 



Vs G T*(a t ) 



ht+i(s) < max 



j6D*(s,r*( CTt )) V*'eJ 



mm 



0dist(s,s') +hY {at \s) 



(A.2) 



where T* (at) is the union of all neighborhoods considered by the policy up to the time t. Formally, T* (at) = 
U t >< t T(a t ). Additionally, D*(s,T) = S»(s,T) U {{s}}. The "raised" heuristic h^ {at \s') comes from 
Definition 17. 



Theorem A.l (maximum heuristic increase criterion). Suppose we start with ^-admissible heuristic h 
such that h(s) = <^=^ s G S g . Then as long as T*(at) fl S g = 0, the condition of ^-admissibility 
(Equation (4.5)) and the strengthened "max of mins" condition are interchangeable in the definition of 9- 
admissible policy ir. 
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